Nebula 1

home *** CD-ROM | disk | FTP | other *** search

/ Nebula 1 / Nebula One.iso / Utilities / BenchMarks / ByteBenchmark / doc / bench.doc next >

Wrap

Text File | 1994-01-28 | 14.3 KB | 228 lines

@BT [TITLE]BYTE's UNIX Benchmarks [DEK]Separating fact from fiction in the exploding UNIX empire [TOC]Before you jump into the UNIX pool, see how your favorite system stacks up against the rest of the pack. Ben Smith In making purchase decisions, it's difficult to know whom to believe. Each vendor claims, predictably, that their products are better than the competition's, but how does one prove, or debunk, these claims? Cost and performance typically top the list of considerations for those seeking to purchase equipment, and while cost can be easily compared, performance cannot, and comparing costs without analyzing each system's relative value is a worthless exercise. When DOS became popular, it allowed for the development of performance measurement programs, benchmarks, that would run on any system that ran DOS. BYTE's lab technicians set about creating their own, and the BYTE DOS benchmarks were born. Dozens of systems have been clocked using these facilities, and each review of a new DOS-based system includes the results of these benchmarks. This is all very well, but while DOS is installed on a great many systems, it is no longer the !ITAL!only!ENDITAL! popular multi-platform operating system. User demands for greater expandability, better performance and multi-tasking have turned UNIX systems into one of the fastest growing segments of the market. When UNIX stepped from minicomputers to workstations, it established itself as the defacto OS for an exciting new breed of machine. Now, with solid implementations for affordable Intel and Motorola-based platforms, UNIX is making a name for itself in the PC realm. As UNIX finds its way into the mainstream, it is necessary to have the tools to objectively measure not only the performance of various hardware platforms, but of different versions of UNIX as well. !SUBHED!Unix is Not MS-DOS!ENDSUBHED! Conceptually, BYTE's UNIX benchmarks are the same as BYTE's MS-DOS benchmarks: We have combined evaluation of both low-level operations and high-level applications type programs to highlight the performance of the entire system. But UNIX is considerably different from MS-DOS. In the first place, it is a !ITAL!multi!ENDITAL!-tasking, !ITAL!multi!ENDITAL!-user operating system. It is also portable, able to run on many different kinds of computers. MS-DOS is a !ITAL!single!ENDITAL!-tasking, !ITAL!single!ENDITAL!-user operating system, and it is intended to run on essentially one kind of computer, an IBM-PC or PC ``clone,'' utilizing a specific class of processor from Intel. As a result, the UNIX benchmarks differ from their MS-DOS counterparts. Even though there are some equivalent low-level tests, you will find that even these run differently; the popular Dhrystone benchmark commonly gives different results, on the same hardware, when run under both DOS and UNIX. The reason? Different compilers are being used, and the underlying operating systems and services are wildly different. Another important difference is that Microsoft is the only real source of DOS; other suppliers simply repackage Microsoft's basic operating system under other names. In contrast, there are many different kinds of UNIX, and while similarities exist (the core UNIX from Dell, Everex and Interactive Systems are virtually the same), there are UNIX and UNIX-like operating systems that differ greatly from one another. Conclusion: The UNIX benchmarks are evaluating the implementation of UNIX and the resident compiler as well as the hardware on which it is running (the MS-DOS and Apple Macintosh benchmarks use a common compiler, the public-domain Small C). With so many variables, what is constant? Well, we have established a baseline, SCO Xenix 386 version 2.3.1. running on the Everex 386/33 with 4 Mbytes of RAM and an 80387 math coprocessor. While it isn't UNIX per se (because AT&T decides which implementations may be called ``UNIX''), it is more popular than any other PC UNIX implementation. It is specifically designed for 80386-based computers with full 32 bit memory access. The Everex 386/33 was chosen because it is one of today's highest performance 386 computers properly configured to run the full 32 bit operating system. (Some 386 computers cannot access memory through single 32 bit operations; small matter if you are just running MS-DOS, an 8 bit operating system, but serious if you want to run UNIX.) This combination of system and OS is timely, but we'll continue to adjust the baseline as needed to reflect the installed PC and workstation UNIX base. !SUBHED!The Low Level Bench Programs!ENDSUBHED! The BYTE UNIX benchmarks consist of eight groups of programs: arithmetic, system calls, memory operations, disk operations, dhrystone, database operations, system loading, and miscellaneous. These can be roughly divided into the low-level tests (arithmetic, system calls, memory, disk, and dhrystone) and high-level tests (database operations, system loading, and the C-compiler test that is part of the miscellaneous set). The Dhrystone test is known more formally as ``Dhrystone 2''. It performs no floating-point operations, but it does involve arrays, character strings, indirect addressing, and most of the non-floating point instructions that might be found in an application program. It also includes conditional operations and other common program flow controls. The output of the test is the number of dhrystone loops per second. It is used in the BYTE benchmarks because of its wide selection of operations and because it is one of the most widely run benchmark programs. A future version of the BYTE UNIX benchmarks will include the Whetstone benchmark test program, as well. The Whetstone benchmark is conceptually similar to the Dhrystone, but with an emphasis on math; it is a mix of floating point and integer arithmetic, function calls, array operations, conditionals, and transcendental function calls. All the arithmetic tests have the same source code with different data types substituted for the operations: register, short, int, long, float, double, and an empty loop for calculating the overhead required by the program. The actual test involves assignment, addition, subtraction, multiplication, and division. Very simple. But don't bother running the float and double precision test unless you have a math co-processor; what takes a math co-processor system 15 seconds, may take an unaided processor 30 minutes or more! The system call tests are: system call overhead, pipe throughput, pipe context switching, spawning of child processes, execl (replacement of the current process by a new process), and file read, write, and copy. The system call overhead test evaluates the time required to do iterations of !MONO!dup()!ENDMONO!, !MONO!close()!ENDMONO!, !MONO!getpid()!ENDMONO!, !MONO!getuid()!ENDMONO!, and !MONO!umask()!ENDMONO! calls. The pipe throughput test has no real counterpart in real-world programming; in it, a single process opens a pipe (an inter-process communications channel that works rather like its plumbing namesake) to itself and spins a megabyte around this short loop. You might call this the pipe overhead test. The context switching pipe test is more like a real-world application; the test program spawns a child process with which it carries on a bi-directional pipe conversation. The spawn test creates a child process which immediately dies after its own !MONO!fork()!ENDMONO!. The process is repeated over and over. Similarly, the exec test is a process that repeatedly changes to a new incarnation. One of the arguments passed to the new incarnation is the number of remaining iterations (there has to be some control, after all). The file read, write, and copy tests capture the number of characters that can be written, read, and copied in a specified time (default is 10 seconds). If you run this test with the minimum element (1 second), you should see a significantly higher value for all operations if your system implements disk cacheing. Be sure you have plenty of disk space before you run this test. !SUBHED!The High-Level Bench Programs!ENDSUBHED! To qualify as a high-level test, the test must involve operations that a real-world application program might employ, including heavy use of the CPU and disk. At the time of writing this article, we have currently implemented only the system loading and database tests, but we will be adding several new tests in the months ahead. The system loading test is a shell script that is run by 1, 2, 4, and 8 concurrent processes. The script consists of an alphabetic sort one file to another; taking the octal dump of the result and doing a numeric sort to a third file; running grep on the result of the alphabetic sort file; !MONO!tee!ENDMONO!ing the result to a file and to !MONO!wc!ENDMONO! (word count); writing the final result to a file; and removing all of the resulting files. This script was used in the original BYTE UNIX benchmarks (1983), but the source file is several magnitudes larger than the original. The C compile and link is nothing more than that. The database operations consist of random read, write, and add operations on a database file. The operations are handled by a server process; the requests come from client processes. The test is run with 1, 2, 4, and 8 client processes. The test employs semaphores and message queues. Semaphores are being used less and less these days. BSD systems use sockets instead in place of both of these System V.3 IPC utilities. System V.4 offers both. This test is being rewritten using sockets, but since Xenix doesn't implement sockets, our baseline configuration becomes instantly obsolete when we replace the database test. Just another one of those little problems in trying to create journalistic computer benchmarks: any program that has been fully debugged is probably obsolete [ Murphy, et al ]. !SUBHED!Miscellany!ENDSUBHED! The remaining tests are in the miscellaneous group: Tower of Hanoi (a test of recursive operations) and a test of the UNIX arbitrary precision calculator calculating the square root of two to 99 decimal places. No doubt, we will be adding tests to this suite as we see the need to test and evaluate from different perspectives. !SUBHED!Problems in the Modern World!ENDSUBHED! The major problem we have had with developing the UNIX benchmark programs is designing them so that they fairly reflect the strengths and weaknesses of all the systems on which we anticipate using them. For example, the operations should allow RISC machines to give appropriately high performance for the sorts of operations that RISC is good for, and should also illustrate improvements provided by faster bus speeds, better math coprocessors and the like. In the case of RISC, the efficiency of the compiler is of utmost importance; RISC compilers must rearrange instructions to take advantage of instruction pipelining (for an overview of RISC, see BYTE, May 1988). The majority of the UNIX systems that we look at employ disk caching. This is especially important because modern UNIX includes swapping and paging out to disk when there is insufficient memory for a task or the number of tasks. It is an interesting exercise to run the disk file operations test with increasingly large files and note the point at which performance drops. !SUBHED!How They Work!ENDSUBHED! A 400 line Bourne shell script (!MONO!Run!ENDMONO!) administers the benchmarking system. After the evaluation of the command line options, the benchmarking operation for each test has three stages: parameter setup, timing the execution of the test, and calculation/formatting operations (see Figure 1). After !MONO!Run!ENDMONO! determines the parameters for the test, it sends a formatted description to the output file. !MONO!Run!ENDMONO! then invokes the specific test by means of the UNIX command !MONO!time!ENDMONO!. The output of !MONO!time!ENDMONO! and any output from the test itself end up in a raw data file. Most tests are run six times so that any variance can be averaged. On completion of a set of tests, !MONO!Run!ENDMONO! invokes a cleanup script, which does the statistical calculations on the raw data using the !MONO!awk!ENDMONO! formatting language. The greater part of the benchmark programs are written in C and are compiled on the test machine prior to running the tests. !SUBHED!Using the Results!ENDSUBHED! If all you need is a raw measure of performance, then feel free to use the Dhrystone and Whetstone tests as indices of just that. But if you want to use the benchmarks to evaluate a machine's ability to serve some real need, then you should do the following: 1. Analyze your requirements as far as the type of computing, amount and type of communications I/O, and amount and type of disk I/O. 2. Score the subject machines using weighting factors that reflect your requirements. 3. Generate a price vs. performance plot. 4. Use the price vs. performance, along with information about the reliability servicability of the hardware. Step 4 is really more of an art than anything else. It is extremely important, however, to not rely on price vs. performance alone. We use our UNIX Benchmarks for doing a rough analysis and comparison of divergent machines. (See figure 2, ``UNIX Machines Tested.'') As you can see, we even go so far as to generate a single index number, a sort of reduction of all of the benchmark tests to a single value. This index is generated by summing the the individual indices of the dhrystone test, the floating point test, the shell test with eight concurrent processes, the C compiler time, the !MONO!dc!ENDMONO! routine, and the tower of hanoi time. By definition, the combined index for the baseline machine is six. Indicess above six imply a better overall performance than the baseline machine; indices less than six, worse performance. Always keep in mind that having a single index rating for a machine may make good cocktail conversation, but it is incredibly simplistic. It is like reducing a complex sculptural shape to a single point; you no longer can tell what you are looking at. This number doesn't reflect any real-world use of a UNIX system. However, the index is devised so that it gives an overall indication of different kinds of system operations and so is valuable to our reviews. BYTE's UNIX benchmarking suite is small enough to port easily to any UNIX system, yet diverse and flexible enough to be useful for a wide spectrum of benchmarking requirements. Besides, they're in the public domain, so they can be obtained for little, if any, cost. What better reason do you need to use them?